Extracting a PP Attachment Data Set from a German Dependency Treebank Using Topological Fields
نویسندگان
چکیده
PP-attachment has traditionally been tackled as a binary classification task where a preposition is attached to the immediately preceding noun or to the main verb. In this paper, we provide an analysis of PP-attachment in German to show that the assumption that prepositions have only two head candidates does not hold. We propose a realistic PP-attachment data set, in which each preposition has multiple head candidates. The data set is extracted automatically from a dependency treebank with topological field annotations. Finally, we show that the task of PP-attachment is substantially more difficult with this realistic data set than with a binary classification data set.
منابع مشابه
How Bad Is The Problem Of PP-Attachment? A Comparison Of English, German And Swedish
The correct attachment of prepositional phrases (PPs) is a central disambiguation problem in parsing natural languages. This paper compares the baseline situation in English, German and Swedish based on manual PP attachments in various treebanks for these languages. We argue that cross-language comparisons of the disambiguation results in previous research is impossible because of the different...
متن کاملHow bad is the problem of PP-attachment?
The correct attachment of prepositional phrases (PPs) is a central disambiguation problem in parsing natural languages. This paper compares the baseline situation in English, German and Swedish based on manual PP attachments in various treebanks for these languages. We argue that cross-language comparisons of the disambiguation results in previous research is impossible because of the different...
متن کاملWhat Treebanks Can Do For You: Rule-based and Machine-learning Approaches to Anaphora Resolution in German
This paper compares two approaches to computational anaphora resolution for German: (i) an adaption of the rule-based RAP algorithm that was originally developed for English by Lappin and Leass, and (ii) a hybrid system for anaphora resolution that combines a rule-based pre-filtering component with a memory-based resolution module. The data source is provided by the TüBa-D/Z treebank of Ger-man...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملSerialising the ISO SynAF Syntactic Object Model
This paper introduces , an XML format developed to serialise the object model defined by the ISO Syntactic Annotation Framework SynAF. Based on widespread best practices we adapt a popular XML format for syntactic annotation, TigerXML, with additional features to support a variety of syntactic phenomena including constituent and dependency structures, binding, and different node types ...
متن کامل